Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 4319 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 371.3 KiB |
| Average record size in memory | 88.0 B |
Variable types
| Numeric | 11 |
|---|
df_index is highly correlated with purchases_no and 1 other fields | High correlation |
gross_revenue is highly correlated with purchases_no and 3 other fields | High correlation |
recency_days is highly correlated with purchases_no and 1 other fields | High correlation |
purchases_no is highly correlated with df_index and 6 other fields | High correlation |
products_no is highly correlated with gross_revenue and 3 other fields | High correlation |
items_no is highly correlated with gross_revenue and 3 other fields | High correlation |
frequency is highly correlated with purchases_no and 1 other fields | High correlation |
returns_no is highly correlated with satisfaction_rate | High correlation |
satisfaction_rate is highly correlated with returns_no | High correlation |
recorrence is highly correlated with df_index and 6 other fields | High correlation |
df_index is highly correlated with recorrence | High correlation |
gross_revenue is highly correlated with purchases_no and 1 other fields | High correlation |
purchases_no is highly correlated with gross_revenue and 3 other fields | High correlation |
products_no is highly correlated with purchases_no | High correlation |
items_no is highly correlated with gross_revenue and 1 other fields | High correlation |
frequency is highly correlated with recorrence | High correlation |
recorrence is highly correlated with df_index and 2 other fields | High correlation |
gross_revenue is highly correlated with purchases_no and 3 other fields | High correlation |
purchases_no is highly correlated with gross_revenue and 3 other fields | High correlation |
products_no is highly correlated with gross_revenue and 3 other fields | High correlation |
items_no is highly correlated with gross_revenue and 3 other fields | High correlation |
returns_no is highly correlated with satisfaction_rate | High correlation |
satisfaction_rate is highly correlated with returns_no | High correlation |
recorrence is highly correlated with gross_revenue and 3 other fields | High correlation |
df_index is highly correlated with recency_days and 1 other fields | High correlation |
gross_revenue is highly correlated with purchases_no and 3 other fields | High correlation |
recency_days is highly correlated with df_index | High correlation |
purchases_no is highly correlated with gross_revenue and 4 other fields | High correlation |
products_no is highly correlated with gross_revenue and 3 other fields | High correlation |
items_no is highly correlated with gross_revenue and 3 other fields | High correlation |
returns_no is highly correlated with gross_revenue and 3 other fields | High correlation |
recorrence is highly correlated with df_index and 1 other fields | High correlation |
gross_revenue is highly skewed (γ1 = 20.83762299) | Skewed |
items_no is highly skewed (γ1 = 22.20471768) | Skewed |
returns_no is highly skewed (γ1 = 27.00404142) | Skewed |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
customer_id has unique values | Unique |
returns_no has 2829 (65.5%) zeros | Zeros |
Reproduction
| Analysis started | 2022-09-08 15:01:51.043289 |
|---|---|
| Analysis finished | 2022-09-08 15:02:23.181953 |
| Duration | 32.14 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 4319 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2179.25793 |
| Minimum | 0 |
|---|---|
| Maximum | 4345 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 220.9 |
| Q1 | 1098.5 |
| median | 2183 |
| Q3 | 3262.5 |
| 95-th percentile | 4129.1 |
| Maximum | 4345 |
| Range | 4345 |
| Interquartile range (IQR) | 2164 |
Descriptive statistics
| Standard deviation | 1253.017453 |
|---|---|
| Coefficient of variation (CV) | 0.5749743691 |
| Kurtosis | -1.194978742 |
| Mean | 2179.25793 |
| Median Absolute Deviation (MAD) | 1082 |
| Skewness | -0.007301569894 |
| Sum | 9412215 |
| Variance | 1570052.738 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 2909 | 1 | < 0.1% |
| 2895 | 1 | < 0.1% |
| 2896 | 1 | < 0.1% |
| 2897 | 1 | < 0.1% |
| 2898 | 1 | < 0.1% |
| 2899 | 1 | < 0.1% |
| 2900 | 1 | < 0.1% |
| 2901 | 1 | < 0.1% |
| 2902 | 1 | < 0.1% |
| Other values (4309) | 4309 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 4345 | 1 | |
| 4344 | 1 | |
| 4343 | 1 | |
| 4342 | 1 | |
| 4341 | 1 | |
| 4340 | 1 | |
| 4339 | 1 | |
| 4338 | 1 | |
| 4337 | 1 | |
| 4336 | 1 |
| Distinct | 4319 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15300.5059 |
| Minimum | 12347 |
|---|---|
| Maximum | 18287 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 12347 |
|---|---|
| 5-th percentile | 12617.9 |
| Q1 | 13815.5 |
| median | 15299 |
| Q3 | 16778.5 |
| 95-th percentile | 17980.4 |
| Maximum | 18287 |
| Range | 5940 |
| Interquartile range (IQR) | 2963 |
Descriptive statistics
| Standard deviation | 1720.328769 |
|---|---|
| Coefficient of variation (CV) | 0.1124360711 |
| Kurtosis | -1.194800323 |
| Mean | 15300.5059 |
| Median Absolute Deviation (MAD) | 1482 |
| Skewness | 0.001466012037 |
| Sum | 66082885 |
| Variance | 2959531.075 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 17850 | 1 | < 0.1% |
| 17299 | 1 | < 0.1% |
| 15014 | 1 | < 0.1% |
| 14765 | 1 | < 0.1% |
| 16869 | 1 | < 0.1% |
| 15909 | 1 | < 0.1% |
| 13618 | 1 | < 0.1% |
| 16050 | 1 | < 0.1% |
| 17879 | 1 | < 0.1% |
| 17562 | 1 | < 0.1% |
| Other values (4309) | 4309 |
| Value | Count | Frequency (%) |
| 12347 | 1 | |
| 12348 | 1 | |
| 12349 | 1 | |
| 12350 | 1 | |
| 12352 | 1 | |
| 12353 | 1 | |
| 12354 | 1 | |
| 12355 | 1 | |
| 12356 | 1 | |
| 12357 | 1 |
| Value | Count | Frequency (%) |
| 18287 | 1 | |
| 18283 | 1 | |
| 18282 | 1 | |
| 18281 | 1 | |
| 18280 | 1 | |
| 18278 | 1 | |
| 18277 | 1 | |
| 18276 | 1 | |
| 18273 | 1 | |
| 18272 | 1 |
gross_revenue
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWED| Distinct | 4239 |
|---|---|
| Distinct (%) | 98.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1971.368585 |
| Minimum | 3.75 |
|---|---|
| Maximum | 279138.02 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 3.75 |
|---|---|
| 5-th percentile | 112.313 |
| Q1 | 306.635 |
| median | 668.36 |
| Q3 | 1632.775 |
| 95-th percentile | 5726.869 |
| Maximum | 279138.02 |
| Range | 279134.27 |
| Interquartile range (IQR) | 1326.14 |
Descriptive statistics
| Standard deviation | 8496.008101 |
|---|---|
| Coefficient of variation (CV) | 4.309700461 |
| Kurtosis | 560.8900505 |
| Mean | 1971.368585 |
| Median Absolute Deviation (MAD) | 463.26 |
| Skewness | 20.83762299 |
| Sum | 8514340.92 |
| Variance | 72182153.66 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 76.32 | 4 | 0.1% |
| 35.4 | 3 | 0.1% |
| 113.5 | 3 | 0.1% |
| 440 | 3 | 0.1% |
| 363.65 | 3 | 0.1% |
| 110.38 | 2 | < 0.1% |
| 324.24 | 2 | < 0.1% |
| 248.61 | 2 | < 0.1% |
| 251.21 | 2 | < 0.1% |
| 112.75 | 2 | < 0.1% |
| Other values (4229) | 4293 |
| Value | Count | Frequency (%) |
| 3.75 | 1 | |
| 5.9 | 1 | |
| 12.75 | 1 | |
| 15 | 2 | |
| 17 | 1 | |
| 20.8 | 2 | |
| 25.5 | 1 | |
| 30 | 1 | |
| 30.6 | 1 | |
| 32.65 | 1 |
| Value | Count | Frequency (%) |
| 279138.02 | 1 | |
| 259657.3 | 1 | |
| 194550.79 | 1 | |
| 140450.72 | 1 | |
| 124564.53 | 1 | |
| 117379.63 | 1 | |
| 91062.38 | 1 | |
| 72882.09 | 1 | |
| 66653.56 | 1 | |
| 65039.62 | 1 |
| Distinct | 304 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 92.0349618 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 34 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 17 |
| median | 50 |
| Q3 | 142 |
| 95-th percentile | 311 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 125 |
Descriptive statistics
| Standard deviation | 100.0708639 |
|---|---|
| Coefficient of variation (CV) | 1.087313581 |
| Kurtosis | 0.4317306489 |
| Mean | 92.0349618 |
| Median Absolute Deviation (MAD) | 40 |
| Skewness | 1.246543803 |
| Sum | 397499 |
| Variance | 10014.1778 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 103 | 2.4% |
| 4 | 94 | 2.2% |
| 3 | 94 | 2.2% |
| 2 | 90 | 2.1% |
| 8 | 79 | 1.8% |
| 10 | 77 | 1.8% |
| 17 | 74 | 1.7% |
| 7 | 71 | 1.6% |
| 9 | 70 | 1.6% |
| 22 | 64 | 1.5% |
| Other values (294) | 3503 |
| Value | Count | Frequency (%) |
| 0 | 34 | 0.8% |
| 1 | 103 | |
| 2 | 90 | |
| 3 | 94 | |
| 4 | 94 | |
| 5 | 48 | |
| 7 | 71 | |
| 8 | 79 | |
| 9 | 70 | |
| 10 | 77 |
| Value | Count | Frequency (%) |
| 373 | 17 | |
| 372 | 17 | |
| 371 | 6 | 0.1% |
| 369 | 3 | 0.1% |
| 368 | 5 | 0.1% |
| 367 | 5 | 0.1% |
| 366 | 10 | |
| 365 | 10 | |
| 364 | 6 | 0.1% |
| 362 | 6 | 0.1% |
| Distinct | 56 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.259550822 |
| Minimum | 1 |
|---|---|
| Maximum | 206 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 5 |
| 95-th percentile | 13 |
| Maximum | 206 |
| Range | 205 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 7.657865258 |
|---|---|
| Coefficient of variation (CV) | 1.797810515 |
| Kurtosis | 244.1834183 |
| Mean | 4.259550822 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 11.95442164 |
| Sum | 18397 |
| Variance | 58.64290031 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1492 | |
| 2 | 827 | |
| 3 | 504 | 11.7% |
| 4 | 394 | 9.1% |
| 5 | 237 | 5.5% |
| 6 | 173 | 4.0% |
| 7 | 138 | 3.2% |
| 8 | 98 | 2.3% |
| 9 | 69 | 1.6% |
| 10 | 55 | 1.3% |
| Other values (46) | 332 | 7.7% |
| Value | Count | Frequency (%) |
| 1 | 1492 | |
| 2 | 827 | |
| 3 | 504 | 11.7% |
| 4 | 394 | 9.1% |
| 5 | 237 | 5.5% |
| 6 | 173 | 4.0% |
| 7 | 138 | 3.2% |
| 8 | 98 | 2.3% |
| 9 | 69 | 1.6% |
| 10 | 55 | 1.3% |
| Value | Count | Frequency (%) |
| 206 | 1 | |
| 199 | 1 | |
| 124 | 1 | |
| 97 | 1 | |
| 91 | 2 | |
| 86 | 1 | |
| 72 | 1 | |
| 62 | 2 | |
| 60 | 1 | |
| 57 | 1 |
| Distinct | 468 |
|---|---|
| Distinct (%) | 10.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 91.74878444 |
| Minimum | 1 |
|---|---|
| Maximum | 7838 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 17 |
| median | 41 |
| Q3 | 100 |
| 95-th percentile | 315.1 |
| Maximum | 7838 |
| Range | 7837 |
| Interquartile range (IQR) | 83 |
Descriptive statistics
| Standard deviation | 228.8353351 |
|---|---|
| Coefficient of variation (CV) | 2.494151137 |
| Kurtosis | 482.3917998 |
| Mean | 91.74878444 |
| Median Absolute Deviation (MAD) | 30 |
| Skewness | 18.08994157 |
| Sum | 396263 |
| Variance | 52365.61057 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10 | 85 | 2.0% |
| 6 | 77 | 1.8% |
| 9 | 75 | 1.7% |
| 1 | 70 | 1.6% |
| 15 | 69 | 1.6% |
| 11 | 68 | 1.6% |
| 8 | 67 | 1.6% |
| 5 | 66 | 1.5% |
| 28 | 65 | 1.5% |
| 7 | 65 | 1.5% |
| Other values (458) | 3612 |
| Value | Count | Frequency (%) |
| 1 | 70 | |
| 2 | 50 | |
| 3 | 55 | |
| 4 | 48 | |
| 5 | 66 | |
| 6 | 77 | |
| 7 | 65 | |
| 8 | 67 | |
| 9 | 75 | |
| 10 | 85 |
| Value | Count | Frequency (%) |
| 7838 | 1 | |
| 5673 | 1 | |
| 5095 | 1 | |
| 4580 | 1 | |
| 2698 | 1 | |
| 2379 | 1 | |
| 2060 | 1 | |
| 1818 | 1 | |
| 1673 | 1 | |
| 1637 | 1 |
items_no
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWED| Distinct | 1760 |
|---|---|
| Distinct (%) | 40.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1157.266728 |
| Minimum | 1 |
|---|---|
| Maximum | 196844 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 47 |
| Q1 | 160.5 |
| median | 379 |
| Q3 | 991.5 |
| 95-th percentile | 3542.1 |
| Maximum | 196844 |
| Range | 196843 |
| Interquartile range (IQR) | 831 |
Descriptive statistics
| Standard deviation | 4775.918514 |
|---|---|
| Coefficient of variation (CV) | 4.126895206 |
| Kurtosis | 731.6935971 |
| Mean | 1157.266728 |
| Median Absolute Deviation (MAD) | 276 |
| Skewness | 22.20471768 |
| Sum | 4998235 |
| Variance | 22809397.65 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 88 | 18 | 0.4% |
| 120 | 17 | 0.4% |
| 84 | 16 | 0.4% |
| 146 | 15 | 0.3% |
| 128 | 15 | 0.3% |
| 144 | 15 | 0.3% |
| 72 | 15 | 0.3% |
| 150 | 14 | 0.3% |
| 106 | 14 | 0.3% |
| 200 | 13 | 0.3% |
| Other values (1750) | 4167 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 7 | |
| 5 | 3 | |
| 6 | 3 | |
| 7 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 9 | 2 | < 0.1% |
| 10 | 5 |
| Value | Count | Frequency (%) |
| 196844 | 1 | |
| 80263 | 1 | |
| 77373 | 1 | |
| 69993 | 1 | |
| 64549 | 1 | |
| 64124 | 1 | |
| 63312 | 1 | |
| 58343 | 1 | |
| 57885 | 1 | |
| 50255 | 1 |
| Distinct | 1226 |
|---|---|
| Distinct (%) | 28.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4028513354 |
| Minimum | 0.005449591281 |
|---|---|
| Maximum | 17 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 0.005449591281 |
|---|---|
| 5-th percentile | 0.0101010101 |
| Q1 | 0.0200904119 |
| median | 0.04545454545 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 17 |
| Range | 16.99455041 |
| Interquartile range (IQR) | 0.9799095881 |
Descriptive statistics
| Standard deviation | 0.5599672117 |
|---|---|
| Coefficient of variation (CV) | 1.390009571 |
| Kurtosis | 178.2198375 |
| Mean | 0.4028513354 |
| Median Absolute Deviation (MAD) | 0.03354978355 |
| Skewness | 6.706318483 |
| Sum | 1739.914917 |
| Variance | 0.3135632782 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1500 | |
| 2 | 48 | 1.1% |
| 0.02777777778 | 17 | 0.4% |
| 0.0625 | 17 | 0.4% |
| 0.02380952381 | 16 | 0.4% |
| 0.08333333333 | 15 | 0.3% |
| 0.03448275862 | 15 | 0.3% |
| 0.09090909091 | 15 | 0.3% |
| 0.02941176471 | 14 | 0.3% |
| 0.03571428571 | 13 | 0.3% |
| Other values (1216) | 2649 |
| Value | Count | Frequency (%) |
| 0.005449591281 | 1 | < 0.1% |
| 0.005464480874 | 1 | < 0.1% |
| 0.005479452055 | 1 | < 0.1% |
| 0.005494505495 | 1 | < 0.1% |
| 0.005586592179 | 2 | |
| 0.005602240896 | 1 | < 0.1% |
| 0.005617977528 | 2 | |
| 0.00566572238 | 1 | < 0.1% |
| 0.005681818182 | 2 | |
| 0.005698005698 | 3 |
| Value | Count | Frequency (%) |
| 17 | 1 | < 0.1% |
| 4 | 1 | < 0.1% |
| 3 | 5 | 0.1% |
| 2 | 48 | 1.1% |
| 1.142857143 | 1 | < 0.1% |
| 1 | 1500 | |
| 0.75 | 1 | < 0.1% |
| 0.6666666667 | 3 | 0.1% |
| 0.550802139 | 1 | < 0.1% |
| 0.5335120643 | 1 | < 0.1% |
| Distinct | 205 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.94327391 |
| Minimum | 0 |
|---|---|
| Maximum | 9014 |
| Zeros | 2829 |
| Zeros (%) | 65.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 3 |
| 95-th percentile | 57 |
| Maximum | 9014 |
| Range | 9014 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 233.2881314 |
|---|---|
| Coefficient of variation (CV) | 10.1680402 |
| Kurtosis | 891.6164328 |
| Mean | 22.94327391 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 27.00404142 |
| Sum | 99092 |
| Variance | 54423.35227 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2829 | |
| 1 | 169 | 3.9% |
| 2 | 149 | 3.4% |
| 3 | 105 | 2.4% |
| 4 | 89 | 2.1% |
| 6 | 78 | 1.8% |
| 5 | 61 | 1.4% |
| 12 | 51 | 1.2% |
| 7 | 44 | 1.0% |
| 8 | 43 | 1.0% |
| Other values (195) | 701 | 16.2% |
| Value | Count | Frequency (%) |
| 0 | 2829 | |
| 1 | 169 | 3.9% |
| 2 | 149 | 3.4% |
| 3 | 105 | 2.4% |
| 4 | 89 | 2.1% |
| 5 | 61 | 1.4% |
| 6 | 78 | 1.8% |
| 7 | 44 | 1.0% |
| 8 | 43 | 1.0% |
| 9 | 41 | 0.9% |
| Value | Count | Frequency (%) |
| 9014 | 1 | |
| 8004 | 1 | |
| 4427 | 1 | |
| 3768 | 1 | |
| 3332 | 1 | |
| 2878 | 1 | |
| 2022 | 1 | |
| 2012 | 1 | |
| 1776 | 1 | |
| 1594 | 1 |
| Distinct | 1378 |
|---|---|
| Distinct (%) | 31.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9865749884 |
| Minimum | 0.01369863014 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 0.01369863014 |
|---|---|
| 5-th percentile | 0.9422317612 |
| Q1 | 0.995456996 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 1 |
| Range | 0.9863013699 |
| Interquartile range (IQR) | 0.004543004033 |
Descriptive statistics
| Standard deviation | 0.05521964059 |
|---|---|
| Coefficient of variation (CV) | 0.05597105263 |
| Kurtosis | 79.17949446 |
| Mean | 0.9865749884 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -7.902730168 |
| Sum | 4261.017375 |
| Variance | 0.003049208707 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 2829 | |
| 0.9892473118 | 4 | 0.1% |
| 0.9903381643 | 4 | 0.1% |
| 0.9907407407 | 3 | 0.1% |
| 0.976744186 | 3 | 0.1% |
| 0.9850746269 | 3 | 0.1% |
| 0.9761904762 | 3 | 0.1% |
| 0.987804878 | 3 | 0.1% |
| 0.9756097561 | 3 | 0.1% |
| 0.9976905312 | 3 | 0.1% |
| Other values (1368) | 1461 |
| Value | Count | Frequency (%) |
| 0.01369863014 | 1 | |
| 0.1666666667 | 1 | |
| 0.3666666667 | 1 | |
| 0.3884892086 | 1 | |
| 0.3991163476 | 1 | |
| 0.4035433071 | 1 | |
| 0.4351145038 | 1 | |
| 0.4353612167 | 1 | |
| 0.4397959184 | 1 | |
| 0.4600938967 | 1 |
| Value | Count | Frequency (%) |
| 1 | 2829 | |
| 0.9998830364 | 1 | < 0.1% |
| 0.9998160074 | 1 | < 0.1% |
| 0.9997183099 | 1 | < 0.1% |
| 0.9996859296 | 1 | < 0.1% |
| 0.9996380746 | 1 | < 0.1% |
| 0.9996367599 | 1 | < 0.1% |
| 0.9996362314 | 1 | < 0.1% |
| 0.9996328928 | 1 | < 0.1% |
| 0.9996069182 | 1 | < 0.1% |
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.957860616 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 33.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 4 |
| 95-th percentile | 9 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.516289249 |
|---|---|
| Coefficient of variation (CV) | 0.8507125846 |
| Kurtosis | 2.506514334 |
| Mean | 2.957860616 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.685040046 |
| Sum | 12775 |
| Variance | 6.331711587 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 1 | 1630 | |
| 2 | 919 | |
| 3 | 527 | 12.2% |
| 4 | 388 | 9.0% |
| 5 | 265 | 6.1% |
| 6 | 165 | 3.8% |
| 7 | 101 | 2.3% |
| 8 | 91 | 2.1% |
| 9 | 71 | 1.6% |
| 12 | 58 | 1.3% |
| Other values (2) | 104 | 2.4% |
| Value | Count | Frequency (%) |
| 1 | 1630 | |
| 2 | 919 | |
| 3 | 527 | 12.2% |
| 4 | 388 | 9.0% |
| 5 | 265 | 6.1% |
| 6 | 165 | 3.8% |
| 7 | 101 | 2.3% |
| 8 | 91 | 2.1% |
| 9 | 71 | 1.6% |
| 10 | 55 | 1.3% |
| Value | Count | Frequency (%) |
| 12 | 58 | 1.3% |
| 11 | 49 | 1.1% |
| 10 | 55 | 1.3% |
| 9 | 71 | 1.6% |
| 8 | 91 | 2.1% |
| 7 | 101 | 2.3% |
| 6 | 165 | 3.8% |
| 5 | 265 | |
| 4 | 388 | |
| 3 | 527 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | customer_id | gross_revenue | recency_days | purchases_no | products_no | items_no | frequency | returns_no | satisfaction_rate | recorrence | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 17850 | 5391.21 | 372.0 | 34.0 | 297.0 | 1733.0 | 17.000000 | 40.0 | 0.976919 | 1.0 |
| 1 | 1 | 13047 | 3232.59 | 56.0 | 9.0 | 171.0 | 1390.0 | 0.028302 | 35.0 | 0.974820 | 7.0 |
| 2 | 2 | 12583 | 6705.38 | 2.0 | 15.0 | 232.0 | 5028.0 | 0.040323 | 50.0 | 0.990056 | 11.0 |
| 3 | 3 | 13748 | 948.25 | 95.0 | 5.0 | 28.0 | 439.0 | 0.017921 | 0.0 | 1.000000 | 3.0 |
| 4 | 4 | 15100 | 876.00 | 333.0 | 3.0 | 3.0 | 80.0 | 0.073171 | 22.0 | 0.725000 | 2.0 |
| 5 | 5 | 15291 | 4623.30 | 25.0 | 14.0 | 102.0 | 2102.0 | 0.040115 | 29.0 | 0.986204 | 7.0 |
| 6 | 6 | 14688 | 5630.87 | 7.0 | 21.0 | 327.0 | 3621.0 | 0.057221 | 399.0 | 0.889809 | 11.0 |
| 7 | 7 | 17809 | 5411.91 | 16.0 | 12.0 | 61.0 | 2057.0 | 0.033520 | 41.0 | 0.980068 | 8.0 |
| 8 | 8 | 15311 | 60767.90 | 0.0 | 91.0 | 2379.0 | 38194.0 | 0.243316 | 474.0 | 0.987590 | 12.0 |
| 9 | 9 | 16098 | 2005.63 | 87.0 | 7.0 | 67.0 | 613.0 | 0.024390 | 0.0 | 1.000000 | 7.0 |
Last rows
| df_index | customer_id | gross_revenue | recency_days | purchases_no | products_no | items_no | frequency | returns_no | satisfaction_rate | recorrence | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 4309 | 4336 | 16000 | 12393.70 | 2.0 | 3.0 | 9.0 | 5110.0 | 3.0 | 0.0 | 1.000000 | 1.0 |
| 4310 | 4337 | 15195 | 3861.00 | 2.0 | 1.0 | 1.0 | 1404.0 | 1.0 | 0.0 | 1.000000 | 1.0 |
| 4311 | 4338 | 14087 | 194.42 | 2.0 | 1.0 | 69.0 | 251.0 | 1.0 | 1.0 | 0.996016 | 1.0 |
| 4312 | 4339 | 14204 | 161.03 | 2.0 | 1.0 | 44.0 | 82.0 | 1.0 | 0.0 | 1.000000 | 1.0 |
| 4313 | 4340 | 15471 | 469.48 | 2.0 | 1.0 | 77.0 | 266.0 | 1.0 | 0.0 | 1.000000 | 1.0 |
| 4314 | 4341 | 13436 | 196.89 | 1.0 | 1.0 | 12.0 | 76.0 | 1.0 | 0.0 | 1.000000 | 1.0 |
| 4315 | 4342 | 15520 | 343.50 | 1.0 | 1.0 | 18.0 | 314.0 | 1.0 | 0.0 | 1.000000 | 1.0 |
| 4316 | 4343 | 13298 | 360.00 | 1.0 | 1.0 | 2.0 | 96.0 | 1.0 | 0.0 | 1.000000 | 1.0 |
| 4317 | 4344 | 14569 | 227.39 | 1.0 | 1.0 | 12.0 | 79.0 | 1.0 | 0.0 | 1.000000 | 1.0 |
| 4318 | 4345 | 12713 | 794.55 | 0.0 | 1.0 | 37.0 | 505.0 | 1.0 | 0.0 | 1.000000 | 1.0 |